AITopics | rl-based method

Collaborating Authors

rl-based method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Curriculum Offline Imitating Learning

Neural Information Processing SystemsDec-23-2025, 23:16:25 GMT

Offline reinforcement learning (RL) tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. Despite the potential to surpass the behavioral policies, RL-based methods are generally impractical due to the training instability and bootstrapping the extrapolation errors, which always require careful hyperparameter tuning via online evaluation. In contrast, offline imitation learning (IL) has no such issues since it learns the policy directly without estimating the value function by bootstrapping. However, IL is usually limited in the capability of the behavioral policy and tends to learn a mediocre behavior from the dataset collected by the mixture of policies. In this paper, we aim to take advantage of IL but mitigate such a drawback. Observing that behavior cloning is able to imitate neighboring policies with less data, we propose \textit{Curriculum Offline Imitation Learning (COIL)}, which utilizes an experience picking strategy to make the agent imitate from adaptive neighboring policies with a higher return, and improves the current policy along curriculum stages. On continuous control benchmarks, we compare COIL against both imitation-based methods and RL-based methods, showing that COIL not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.

curriculum offline imitating learning, electronic proceedings, name change, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Comparative Study of OpenMP Scheduling Algorithm Selection Strategies

Korndörfer, Jonas H. Müller, Mohammed, Ali, Eleliemy, Ahmed, Guilloteau, Quentin, Krummenacher, Reto, Ciorba, Florina M.

arXiv.org Artificial IntelligenceJul-29-2025

Scientific and data science applications are becoming increasingly complex, with growing computational and memory demands. Modern high performance computing (HPC) systems provide high parallelism and heterogeneity across nodes, devices, and cores. To achieve good performance, effective scheduling and load balancing techniques are essential. Parallel programming frameworks such as OpenMP now offer a variety of advanced scheduling algorithms to support diverse applications and platforms. This creates an instance of the scheduling algorithm selection problem, which involves identifying the most suitable algorithm for a given combination of workload and system characteristics. In this work, we explore learning-based approaches for selecting scheduling algorithms in OpenMP. We propose and evaluate expert-based and reinforcement learning (RL)-based methods, and conduct a detailed performance analysis across six applications and three systems. Our results show that RL methods are capable of learning high-performing scheduling decisions, although they require significant exploration, with the choice of reward function playing a key role. Expert-based methods, in contrast, rely on prior knowledge and involve less exploration, though they may not always identify the optimal algorithm for a specific application-system pair. By combining expert knowledge with RL-based learning, we achieve improved performance and greater adaptability. Overall, this work demonstrates that dynamic selection of scheduling algorithms during execution is both viable and beneficial for OpenMP applications. The approach can also be extended to MPI-based programs, enabling optimization of scheduling decisions across multiple levels of parallelism.

artificial intelligence, machine learning, scheduling algorithm, (16 more...)

arXiv.org Artificial Intelligence

2507.20312

Country:

North America > United States (0.92)
Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.48)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SwitchMT: An Adaptive Context Switching Methodology for Scalable Multi-Task Learning in Intelligent Autonomous Agents

Devkota, Avaneesh, Putra, Rachmad Vidya Wicaksana, Shafique, Muhammad

arXiv.org Artificial IntelligenceApr-21-2025

The ability to train intelligent autonomous agents (such as mobile robots) on multiple tasks is crucial for adapting to dynamic real-world environments. However, state-of-the-art reinforcement learning (RL) methods only excel in single-task settings, and still struggle to generalize across multiple tasks due to task interference. Moreover, real-world environments also demand the agents to have data stream processing capabilities. Toward this, a state-of-the-art work employs Spiking Neural Networks (SNNs) to improve multi-task learning by exploiting temporal information in data stream, while enabling lowpower/energy event-based operations. However, it relies on fixed context/task-switching intervals during its training, hence limiting the scalability and effectiveness of multi-task learning. To address these limitations, we propose SwitchMT, a novel adaptive task-switching methodology for RL-based multi-task learning in autonomous agents. Specifically, SwitchMT employs the following key ideas: (1) a Deep Spiking Q-Network with active dendrites and dueling structure, that utilizes task-specific context signals to create specialized sub-networks; and (2) an adaptive task-switching policy that leverages both rewards and internal dynamics of the network parameters. Experimental results demonstrate that SwitchMT achieves superior performance in multi-task learning compared to state-of-the-art methods. It achieves competitive scores in multiple Atari games (i.e., Pong: -8.8, Breakout: 5.6, and Enduro: 355.2) compared to the state-of-the-art, showing its better generalized learning capability. These results highlight the effectiveness of our SwitchMT methodology in addressing task interference while enabling multi-task learning automation through adaptive task switching, thereby paving the way for more efficient generalist agents with scalable multi-task learning capabilities.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2504.13541

Country: Asia > Middle East > UAE (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (0.57)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for Multi-Intersection Traffic Signal Control

Duan, Wenchang, Gao, Zhenguo, He, Jiwan, Xian, Jinguo

arXiv.org Artificial IntelligenceDec-25-2024

Adaptive Traffic Signal Control (ATSC) system is a critical component of intelligent transportation, with the capability to significantly alleviate urban traffic congestion. Although reinforcement learning (RL)-based methods have demonstrated promising performance in achieving ATSC, existing methods are still prone to making unreasonable policies. Therefore, this paper proposes a novel Bayesian Critique-Tune-Based Reinforcement Learning with Adaptive Pressure for multi-intersection signal control (BCT-APLight). In BCT-APLight, the Critique-Tune (CT) framework, a two-layer Bayesian structure is designed to refine the excessive trust of RL policies. Specifically, the Bayesian inference-based Critique Layer provides effective evaluations of the credibility of policies; the Bayesian decision-based Tune Layer fine-tunes policies by minimizing the posterior risks when the evaluations are negative. Meanwhile, an attention-based Adaptive Pressure (AP) mechanism is designed to effectively weight the vehicle queues in each lane, thereby enhancing the rationality of traffic movement representation within the network. Equipped with the CT framework and AP mechanism, BCT-APLight effectively enhances the reasonableness of RL policies. Extensive experiments conducted with a simulator across a range of intersection layouts demonstrate that BCT-APLight is superior to other state-of-the-art (SOTA) methods on seven real-world datasets. Specifically, BCT-APLight decreases average queue length by \textbf{\(\boldsymbol{9.60\%}\)} and average waiting time by \textbf{\(\boldsymbol{15.28\%}\)}.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2412.16225

Country:

Asia > China (0.48)
North America > United States (0.47)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Curriculum Offline Imitating Learning

Neural Information Processing SystemsOct-9-2024, 23:55:09 GMT

curriculum offline imitating learning, mediocre behavior, rl-based method, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reinforcement Learning with Token-level Feedback for Controllable Text Generation

Li, Wendi, Wei, Wei, Xu, Kaihe, Xie, Wenfeng, Chen, Dangyang, Cheng, Yu

arXiv.org Artificial IntelligenceMar-18-2024

To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally guided by coarse-grained (sentence/paragraph-level) feedback, which may lead to suboptimal performance owing to semantic twists or progressions within sentences. To tackle that, we propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation, and employs a "first-quantize-then-noise" paradigm to enhance the robustness of the RL algorithm.Furthermore, TOLE can be flexibly extended to multiple constraints with little computational expense. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks. We have released our codes at https://github.com/WindyLee0822/CTG

algorithm, computational linguistic, text generation, (16 more...)

arXiv.org Artificial Intelligence

2403.11558

Country:

Europe > Germany (0.04)
Europe > France (0.04)
North America > Canada > Ontario > Toronto (0.04)
(15 more...)

Genre: Research Report (0.70)

Industry: Consumer Products & Services > Restaurants (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning

Zhang, Liang, Deng, Jianming

arXiv.org Artificial IntelligenceMar-19-2023

Applying reinforcement learning (RL) to traffic signal control (TSC) has become a promising solution. However, most RL-based methods focus solely on optimization within simulators and give little thought to deployment issues in the real world. Online RL-based methods, which require interaction with the environment, are limited in their interactions with the real-world environment. Additionally, acquiring an offline dataset for offline RL is challenging in the real world. Moreover, most real-world intersections prefer a cyclical phase structure. To address these challenges, we propose: (1) a cyclical offline dataset (COD), designed based on common real-world scenarios to facilitate easy collection; (2) an offline RL model called DataLight, capable of learning satisfactory control strategies from the COD; and (3) a method called Arbitrary To Cyclical (ATC), which can transform most RL-based methods into cyclical signal control. Extensive experiments using real-world datasets on simulators demonstrate that: (1) DataLight outperforms most existing methods and achieves comparable results with the best-performing method; (2) introducing ATC into some recent RL-based methods achieves satisfactory performance; and (3) COD is reliable, with DataLight remaining robust even with a small amount of data. These results suggest that the cyclical offline dataset might be enough for offline RL for TSC. Our proposed methods make significant contributions to the TSC field and successfully bridge the gap between simulation experiments and real-world applications. Our code is released on Github.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2303.10828

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
North America > United States > New York (0.04)
Asia > China > Gansu Province > Lanzhou (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Expression is enough: Improving traffic signal control with advanced traffic state representation

Zhang, Liang, Wu, Qiang, Shen, Jun, Lü, Linyuan, Wu, Jianqing, Du, Bo

arXiv.org Artificial IntelligenceDec-19-2021

Recently, finding fundamental properties for traffic state representation is more critical than complex algorithms for traffic signal control (TSC).In this paper, we (1) present a novel, flexible and straightforward method advanced max pressure (Advanced-MP), taking both running and queueing vehicles into consideration to decide whether to change current phase; (2) novelty design the traffic movement representation with the efficient pressure and effective running vehicles from Advanced-MP, namely advanced traffic state (ATS); (3) develop an RL-based algorithm template Advanced-XLight, by combining ATS with current RL approaches and generate two RL algorithms, "Advanced-MPLight" and "Advanced-CoLight". Comprehensive experiments on multiple real-world datasets show that: (1) the Advanced-MP outperforms baseline methods, which is efficient and reliable for deployment; (2) Advanced-MPLight and Advanced-CoLight could achieve new state-of-the-art. Our code is released on Github.

intersection, phase duration, vehicle, (15 more...)

arXiv.org Artificial Intelligence

2112.10107

Country:

North America > United States (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.06)
Oceania > Australia > New South Wales > Wollongong (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Efficient Pressure: Improving efficiency for signalized intersections

Wu, Qiang, Zhang, Liang, Shen, Jun, Lü, Linyuan, Du, Bo, Wu, Jianqing

arXiv.org Artificial IntelligenceDec-4-2021

Since conventional approaches could not adapt to dynamic traffic conditions, reinforcement learning (RL) has attracted more attention to help solve the traffic signal control (TSC) problem. However, existing RL-based methods are rarely deployed considering that they are neither cost-effective in terms of computing resources nor more robust than traditional approaches, which raises a critical research question: how to construct an adaptive controller for TSC with less training and reduced complexity based on RL-based approach? To address this question, in this paper, we (1) innovatively specify the traffic movement representation as a simple but efficient pressure of vehicle queues in a traffic network, namely efficient pressure (EP); (2) build a traffic signal settings protocol, including phase duration, signal phase number and EP for TSC; (3) design a TSC approach based on the traditional max pressure (MP) approach, namely efficient max pressure (Efficient-MP) using the EP to capture the traffic state; and (4) develop a general RL-based TSC algorithm template: efficient Xlight (Efficient-XLight) under EP. Through comprehensive experiments on multiple real-world datasets in our traffic signal settings' protocol for TSC, we demonstrate that efficient pressure is complementary to traditional and RL-based modeling to design better TSC methods. Our code is released on Github.

intersection, traffic movement, traffic state, (15 more...)

arXiv.org Artificial Intelligence

2112.02336

Country:

Asia > China > Zhejiang Province > Hangzhou (0.06)
Oceania > Australia > New South Wales > Wollongong (0.04)
Asia > China > Gansu Province > Lanzhou (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback

Homotopy Based Reinforcement Learning with Maximum Entropy for Autonomous Air Combat

Zhu, Yiwen, Fang, Zhou, Zheng, Yuan, Wei, Wenya

arXiv.org Artificial IntelligenceDec-1-2021

The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has long been a challenging problem. The conventional search method can hardly satisfy the real-time demand during high dynamics air combat scenarios. The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks. However, the sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task, which raises great difficulties for the RL air combat application. In this paper, we propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward. The convergence and the feasibility of this method are also proved in this paper. To confirm our method feasibly, we construct a detailed 3D air combat simulation environment for the RL-based methods training firstly, and we implement our method in both the attack horizontal flight UCAV task and the self-play confrontation task. Experimental results show that our method performs better than the methods only utilizing the sparse reward or the artificial prior experience reward. The agent trained by our method can reach more than 98.3% win rate in the attack horizontal flight UCAV task and average 67.4% win rate when confronted with the agents trained by the other two methods.

agent, air combat scenario, arxiv template, (12 more...)

arXiv.org Artificial Intelligence

2112.01328

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > Texas > Collin County > Plano (0.04)
North America > United States > New Mexico (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Government > Military (1.00)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback